The Twofish Encryption Algorithm: A 128-Bit Block Cipher The Twofish Encryption Algorithm: A 128-Bit Block Cipher
by Bruce Schneier ; John Kelsey ; Doug Whiting ; David Wagner ; Chris Hall ; Niels Ferguson
Wiley Computer Publishing, John Wiley & Sons, Inc.
ISBN: 0471353817   Pub Date: 03/01/99
  

Previous Table of Contents Next


Chapter 5
Performance of Twofish

Twofish has been designed from the start with performance in mind. It is efficient on a variety of platforms: 32-bit CPUs, 8-bit smart cards, and dedicated VLSI hardware. More importantly, though, Twofish has been designed to allow several layers of performance tradeoffs, depending on the relative importance of encryption speed, key setup, memory use, hardware gate count, and other implementation parameters. The result is a highly flexible algorithm that can be implemented efficiently in a variety of cryptographic applications.

All these options are interoperable; these are simply implementation tradeoffs and do not affect the mathematics of Twofish. One end of a communication could use the fastest Pentium II implementation, and the other the cheapest hardware implementation.

5.1 Performance on Large Microprocessors

Table 5.1 gives Twofish’s performance, encryption or decryption, for different key scheduling options and on several modern microprocessors using different languages and compilers. This table shows our results for many different implementations. Each implementation is presented on a single line. The first column gives the CPU the implementation was run on (PPro/II = Pentium Pro/Pentium II, U-SPARC = Ultra-SPARC, PPC = Power PC). The second column is the programming language (ASM = assembly language, MS C = Microsoft Visual C++ 4.2, BC = Borland C 5.0, C = standard C compiler). The keying options are explained below. The code size column contains the approximate total code size (in bytes) of the routines for encryption, decryption, and key setup, where available. All remaining numbers in the row are in clock cycles. For each key size we show the number of clock cycles required for the key setup, and the number of clock cycles required to encrypt a single block. The times for encryption and decryption are identical in assembly, and encryption is slightly slower than decryption in C; only the encryption (i.e., the larger) number is given. There is no time required to set up the algorithm except for key setup. The time required to change a key is the same as the time required to set up a key.

For example, on a Pentium Pro a fully optimized assembly-language version of Twofish can encrypt or decrypt data in 258 clock cycles per block, or 16.1 clock cycles per byte, after a 12700-clock key setup (equivalent to encrypting 45 blocks). On a 200 MHz Pentium Pro microprocessor, this translates to a 90 Mbits/sec.

Processor Lang Keying Option Code Size Clocks to Key Clocks to Encrypt
        128 192 256 128 192 256
PPro/II ASM Comp. 9000 8600 11300 14100 258 258 258
PPro/II ASM Full 8500 7600 10400 13200 315 315 315
PPro/II ASM Part. 10700 4900 7600 10500 460 460 460
PPro/II ASM Min. 13600 2400 5300 8200 720 720 720
PPro/II ASM Zero 9100 1250 1600 2000 860 1130 1420
PPro/II MS C Full 11200 8000 11200 15700 600 600 600
PPro/II MS C Part. 13200 7100 9700 14100 800 800 800
PPro/II MS C Min. 16600 3000 7800 12200 1130 1130 1130
PPro/II MS C Zero 10500 2450 3200 4000 1310 1750 2200
PPro/II BC Full 14100 10300 13600 18800 640 640 640
PPro/II BC Part. 14300 9500 11200 16600 840 840 840
PPro/II BC Min. 17300 4600 10300 15300 1160 1160 1160
PPro/II BC Zero 10100 3200 4200 4800 1910 2670 3470
Pentium ASM Comp. 9100 12300 14600 17100 290 290 290
Pentium ASM Full 8200 11000 13500 16200 315 315 315
Pentium ASM Part. 10300 5500 7800 9800 430 430 430
Pentium ASM Min. 12600 3700 5900 7900 740 740 740
Pentium ASM Zero 8700 1800 2100 2600 1000 1300 1600
Pentium MS C Full 11800 11900 15100 21500 630 630 630
Pentium MS C Part. 14100 9200 13400 19800 900 900 900
Pentium MS C Min. 17800 3800 11100 16900 1460 1460 1460
Pentium MS C Zero 11300 2800 3900 4900 1740 2260 2760
Pentium BC Full 12700 14200 18100 26100 870 870 870
Pentium BC Part. 14200 11200 16500 24100 1100 1100 1100
Pentium BC Min. 17500 4700 12100 19200 1860 1860 1860
Pentium BC Zero 11800 3700 4900 6100 2150 2730 3270
U-SPARC C Full   16600 21600 24900 750 750 750
U-SPARC C Part.   8300 13300 19900 930 930 930
U-SPARC C Min.   3300 11600 16600 1200 1200 1200
U-SPARC C Zero   1700 3300 5000 1450 1680 1870
PPC 750 C Full   12200 17100 22200 590 590 590
PPC 750 C Part.   7800 12200 17300 780 780 780
PPC 750 C Min.   2900 9100 14200 1280 1280 1280
PPC 750 C Zero   2500 3600 4900 1030 1580 2040
68040 C Full 16700 53000 63500 96700 3500 3500 3500
68040 C Part. 18100 36700 47500 78500 4900 4900 4900
68040 C Min. 23300 11000 40000 71800 8150 8150 8150
68040 C Zero 16200 9800 13300 17000 6800 8600 10400
Table 5.1. Twofish Performance with Different Key Lengths and Options


Previous Table of Contents Next